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Abstract — Network detection is an important capability 
in many areas of applied research in which data can be 
represented as a graph of entities and relationships. Often- 
times the object of interest is a relatively small subgraph in 
an enormous, potentially uninteresting background. This 
aspect characterizes network detection as a "big data" 
problem. Graph partitioning and network discovery have 
been major research areas over the last ten years, driven by 
interest in internet search, cyber security, social networks, 
and criminal or terrorist activities. The specific problem of 
network discovery is addressed as a special case of graph 
partitioning in which membership in a small subgraph 
of interest must be determined. Algebraic graph theory is 
used as the basis to analyze and compare different network 
detection methods. A new Bayesian network detection 
framework is introduced that partitions the graph based 
on prior information and direct observations. The new 
approach, called space-time threat propagation, is proved 
to maximize the probability of detection and is therefore 
optimum in the Neyman-Pearson sense. This optimaUty 
criterion is compared to spectral community detection 
approaches which divide the global graph into subsets 
or communities with optimal connectivity properties. We 
also explore a new generative stochastic model for covert 
networks and analyze using receiver operating character- 
istics the detection performance of both classes of optimal 
detection techniques. 



I. Introduction 

Network detection is a special class of the more 
general graph partitioning (GP) problem in which the 
binary decision of membership or non-membership for 
each graph vertex must be determined. This detection 
problem and more generally GP are of fundamental and 
practical importance in graph theory and its applications 
(Figure [T]|. The detected subgraph comprises all vertices 
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declared to be members. The very definition of member- 
ship will lead to specific network detection algorithms. 

Graph partitioning is an NP-hard problem; however, 
semidefinite programming (SDP) relaxation applies to 
many cases, offering both practical and oftentimes the- 
oretically attractive approximation to GP. [32| , |56[ In 
general, practical GP approaches exploit a variety of 
global and local connectivity properties to divide a graph 
into many subgraphs. Decreasing algorithmic complexity 
is achieved in certain domains that may be cast as 
quadratic optimization problems (yielding eigenvalue- 
or spectral-based methods), or simple sets of linear 
equations. One important network detection approach, 
called community detection, divides the global graph 
into subsets or communities based on optimizing a 
specific connectivity measure that is chosen depending 
upon the application. This paper presents a new Bayes- 
ian network detection approach called space-time threat 



propagation |40|, |47| that is shown to optimize the 
probability of network detection in a Neyman-Pearson 
sense given prior information and/or direct observations, 
i.e. detection probability is maximized given a fixed 
false alarm a.k.a. false positive probability. This is an 
important property because it provides a practical op- 
timum algorithm in many settings (satisfying a set of 
assumptions detailed later in the paper), and it provides 
a performance bound on detection performance. Remark- 
ably, the two apparently different optimal network detec- 
tion approaches are related to each other using insights 
from algebraic graph theory. Converse to other research 
on network detection, rather than using the network to 
detect signals, |j3|, |10|, |27| the signal of interest in 
this paper is the signal to be detected. In this sense 
the paper is also related to work on so-called manifold 
learning methods, |[5|, |[8|, |12| although the network to 
be detected is a subgraph of an existing network, and 
therefore the methods described here belong to a class of 
network anomoly detection [8] as well as maximimum- 
likelihood methods for network detection. |17] Both 
spectral-based and Neyman-Pearson network detection 
methods are described and analyzed below in Sections 



III and IV Furthermore, network detection performance 



is assessed using a new stochastic blockmodel Q for 
small, dynamic foreground networks embedded within a 
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Fig. 1. Network detection algorithm taxonomy. This paper focuses on local spectral and harmonic methods for network detection. (Images 
used from various sources; clockwise from upper left (6l|48|[24l|38[|23[|44[|58|). 



large background. 

A. Covert Networks 

Detection of network communities is most likely to 
be effective if the communities exhibit high levels of 
connection activity. However, the covert networks of 
interest to many applications are unlikely to cooperate 
with this optimistic assumption. Indeed, a "fully con- 
nected network ... is an unlikely description of the 
enemy insurgent order of battle." |50| A clandestine 
or covert community is more likely to appear cellular 
and distributed. |7| Communities of this type can be 
represented with "small world" models. |43| The covert 
networks of interest in this paper exist to accomplish 
nefarious, illegal, or terrorism goals, while "hiding in 



plain sight." ]26| , |57| Covert networks necessarily adopt 
operational procedures to remain hidden and robustly 
adapt to losses of parts of the network. For example, 
during the Algerian Revolution the FLN's Autonomous 
Zone of Algiers (Z.A.A.) military command was "care- 
fully kept apart from other elements of the organization, 
the network was broken down into a number of quite 
distinct and compartmented branches, in communication 
only with the network chief," allowing Z.A.A. leader 
Yassef Saadi to command "within 200 yards from the 
office of the [French] army commandant . . . and remain 



9/11 terrorist network details the strategy for keeping cell 
members distant from each other and from other cells 
and notes bin Laden's description of this organization: 
"those . . . who were trained to fly didn't know the others. 
One group of people did not know the other group." 
pOl A covert network does not have to be human to be 
nefarious; the widespread Flashback malware attack on 
Apple's OS X computers employed switched load bal- 



ancing between servers to avoid detection, 1 14 1 mirroring 
the Z.A.A.'s "tree" structure for robust covert network 
organization. 

In order to accomplish its goals the covert network 



there several months[.]" |49 1 Krebs' reconstruction of the 



must judiciously use "transitory shortcuts." |53| For 
example, in the 9/11 terrorism operation, after coordi- 
nation meetings connected distant parts of the network, 
the "cross-ties went dormant." |30] It is during these 
occasional bursts of connection activity that a covert 
community may be most vulnerable to detection. |50| 

Network detection is predicated on the existence of 
observations of network relationships. In this paper the 
focus will be on observations of network activities us- 
ing Intelligence, Surveillance, and Reconnaissance (ISR) 
sensors, such as Wide- Area Motion Imagery (WAMI). 
Covert networks engaged in terrorist attacks with Im- 
provised Explosive Devices (lEDs) comprise loosely 
connected cells with various functions, such as finance, 
planning, operations, logistics, security, and propaganda. 
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In this paper a new model of covert threat for detec- 
tion analysis that accounts for the realities of dynamic 
foreground networks in large backgrounds is a specially 
adapted version of a mixed membership stochastic block- 
model. [2] The terrorist cells of interest are embedded 
into a background consisting of many "neutral" commu- 
nities, that represent business, homes, industry, religion, 
sports, etc. Because in real life people wear different 
"hats" depending upon on the communities with which 
they interact, their proportions of membership in multi- 
ple communities (lifestyles) can be adjusted to control 
the occasional coordination between the foreground and 
background networks. The new generative blockmodel 



approach introduced in Section IV-A leads to a analyti- 
cally tractable tool with sufficient parameters to exhibit 
realistic coordinated activity levels and interactions. 

B. Observability and Detectability 

The connections (edges) between nodes of a network 
are observable only when they are active. This implies 
that there are two basic strategies for detecting a covert 
threat: (1) subject-based Bayesian models that correlate 
a priori information or observations of the observed net- 
work connections; (2) pattern-based (predictive) methods 
that look for known patterns of organization/behavior to 



infer nefarious activity. p6| , | |41J Subject-based methods 
follow established principles of police investigations to 
accrue evidence based upon observed connections and 
historical data. The dependency of predictive methods on 
known patterns, however, makes them difficult to apply 
to rare and widely different covert threats: "there are no 
meaningful patterns that show what behavior indicates 
planning or preparation for terrorism." |26| The real- 
world consequences of applying an inappropriate model 
to detect a threat may include an unacceptable number 
of false positives and an erosion of individual privacy 



rights and civil liberties. | |26| , | [4T | 

As described above, the subject of community detec- 
tion in graphs has experienced extensive research during 
the last ten years. |21j, g5), ||29), |l38|, g9| Never- 
theless, there are few closed-form results that quantify 
the limits of detectability of specific types of networks 
in representative backgrounds. Fully connected networks 
(cliques) have received special attention: there is a recent 
result which confirms in closed form using random 
matrix theory the previously observed phase transition 



of detectability for sufficiently small cliques |20|, |31|, 
|[37 | or dense subgraphs |j4|. 

In this paper we use the proposed generative stochastic 
threat model with Monte Carlo detection performance 
analysis. The detection methodologies under investiga- 



tion here include the spectral-based and Neyman-Pearson 
techniques discussed above in the Introduction. 

II. Algebraic Graph Theory 

A graph G = {V, E) is defined by two sets, the 
vertices V of G, and the edges E C \Vf C 2^^ 
of G, in which [F]^ denotes the set of 2-element subsets 
of V. For example, the sets V = {1, 2, 3}, 
-E = { {1, 2}, {2, 3} } describe a simple graph with 
undirected edges between vertices 1 and 2, and 2 and 3: 
(T) — (2) — (3). The adjacency matrix A = A(G) of G 
is the {0, 1} -matrix with Aij = 1 iff G E. In 

the example, A = ^ 1 1 ^ Because simple graphs 
are undirected, their adjacency matrix is necessarily 
symmetric. The degree matrix D = Diag(A-l) is the 
diagonal matrix of the vector of degrees of all vertices, 
where 1 = (1,...,!)^ is the vector of all ones. 

Many important applications involve an orientation be- 
tween vertices, defined by an orientation map a: [1^]^ — )■ 
V xV (the ordered Cartesian product of V with itself) 
in which the first and second coordinates are called 
the initial and terminal vertices, respectively. The corre- 
sponding directed graph is denoted G" or, by abuse of 
notation, simply G. The preceding example with orien- 
tation map cr({l, 2}) = (2, 1), cr({2, 3}) = (2,3) yields 
the directed graph (T)( — (2) — >(?)■ The incidence matrix 
B = B(G'^) of the oriented graph G"" is the (0, ±1)- 
matrix of size ^V-hy-#E with Bjg = —1 if i is an 
initial vertex of o"(e), 1 if i is a terminal vertex of (7{e), 
and otherwise. In the example, B = ^-1 -i^ In the 
study of homology in algebraic topology, the incidence 
matrix is recognized as the boundary operator on graph 
edges. It encodes differences between vertices and plays 
an important role in the analysis of network detection 
algorithms through the so-called graph Laplacian, which 
appears in three forms. The unnormalized Laplacian 
matrix or Kirchhoff matrix of a graph G is the matrix 



Q = Q(G) = BB^ = D - A, 



(1) 



where B(G'^) is the incidence matrix of an oriented 
graph with (arbitrary) orientation a, and A(G) 

and D(G) are, respectively, the adjacency and degree 

/ 1-1 On 

matrices of G. In the example, Q = I -1 2 -1 1. The 
(normalized) Laplacian matrix 

L = D-i/2QD-i/2 = i_d-i/2ad-i/2 (2) 

is a matrix congruence of the Kirchhoff matrix Q scaled 
by the square-root of the degree matrix D^/^. The 
generalized or asymmetric Laplacian matrix 

L = D^/2lD^/2^D-^Q = I-D^A (3) 
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is a similarity transformation of the Laplacian matrix. 

/ 1 -2-1/2 \ 

In the example, L = -i-^'"^ i -2-^i'^ and L = 

V -2-1/2 1 ) 

1 -1 \ 

2-1 1-2-1 -pf^e latter example is immediately 

0-11/ 

recognized as a discretization of the second deriva- 
tive —d?/dx^, i.e. the negative of the 1-d Laplacian 
operator A = 5^/ dx^ + d"^/ dy^ + • • • that appears 
in numerous physical applications. (This sign is the 
convention used in graph theory.) The asymmetric Lapla- 
cian L = I — D^^A plays an important role in mean- 
value theorems involving solutions to Laplace's equation 
Lv = 0, which will be seen to be the motivating equation 
behind several network detection algorithms. 

The connection between the incidence and Lapla- 
cian matrices and physical applications is made through 
Green's first identity, which equates the continuous 
Laplacian operator A in terms of the vector gradient 
V = {d/dx, d/dy, . . .y and motivates the definition 
Q = BB^ of the graph Laplacian. Given two arbitrary 
"test" functions /(x) and g{-x.) on a bounded domain 
C M" with boundary (9^2 and inner product ( , ) , 
Green's first identity asserts, 

[ gAfdV = - [ {Vg,Vf)dV+ [ g{Vf,ii)dS, 
Jn Jn Jan 

(4) 

where dV and ndS are the volume and directed sur- 
face differentials — this formula generalizes immediately 
to Riemannian manifolds. Applying the finite element 
method to this continuous equation yields a graph 
arising from, say, Delaunay triangulation and a ma- 
trix equation involving the graph Laplacian matrix L 
[from gAf in Eq. Q] and the normalized outer prod- 
uct D^^/^BB^D~^^ of the incidence matrix [from 
{Vg,Vf) in Eq. Q]. This illustrates that the graph 
Laplacian is the standard Laplacian of physics and 
mathematics, a connection that explains many theoretical 
and performance advantages of the normalized Laplacian 



over the Kirchhoff matrix across applications. 1 11 1, |45 1, 



|52|, |54|, 1551 



The most important property of the Laplacian matrix 
is that the constant vector 1 = (1, . . . , 1)^ is in the kernel 
of the Laplacian, 



Q1 = 0; L1 = 0, 



(5) 



i.e. 1 is an eigenvector of Q and L whose eigenvalue 
is zero. This property is the reason for the mean-value 
property of harmonic functions, as well as the fact that 
the only bounded harmonic functions on an unbounded 
domain are necessarily constant, which will play an 
important role in optimum network detection. This is 
a key fact because many network detection algorithms 



involve solutions to Laplace's equation, however this 
constant solution does not distinguish between vertices 
at all, a deficiency that may be resolved in a variety 
of ways, yielding a family of network detection algo- 
rithms. Furthermore, the geometric multiplicity of the 
zero eigenvalue equals the number of connected compo- 
nents of the graph, though because a connected graph 
is implicit for the subgraph detection problem, we may 
assume that the kernel of the graph Laplacian is simply 
the one-dimensional subspace (l) = {al:aGM}. 

in. Optimum Network Detection 

Two different optimality criteria are used for the 
two different strategies of network detection: various 
connectivity metrics are used for predictive methods, and 
detection performance is used for subject-based meth- 
ods. Detection optimality means, as usual, optimality 
in the Neyman-Pearson sense in which the probability 
of detection is maximized at a fixed false alarm rate. 
In the context of networks, the probability of detec- 
tion (PD) refers to the fraction of vertices detected 
belonging to the threat subgraph, and the probability of 
false alarm (PEA) refers to the fraction of non-threat 



vertices detected. As in classical detection theory, |51| 
the optimal detector is a threshold of the log-likelihood 
ratio (LLR), and a new Bayesian framework for network 
detection is developed in this section. The distinction 
between classical detection theory and network detection 
theory is not in the form of the optimal detector — 
the log-likelihood ratio — but in distinct mathematical 
formulations. Whereas linear algebra is the foundation 
for classical detection theory, algebraic graph theory 



1 22 1 is the foundation for network detection. It follows 



that understanding the theory, algorithms, and results 
of network detection requires an introduction of some 
basic concepts from algebraic graph theory, especially 
the graph Laplacian and spectral analysis of graphs. 
Eamiliarization with these objects provides a common 
framework of comparing apparently unrelated network 
detection algorithms and provides deep insights into 
basic problems in network detection theory. 

A. Spectral-Based Community Detection 

Efficient graph partitioning algorithms and analysis 
appeared in the 1970s with Donath and Hoffman's 



eigenvalue-based bounds for graph partitioning 1 15| and 



Eiedler's connectivity analysis and graph partitioning 
algorithm [18], [19] which established the connection 
between a graph's algebraic properties and the spectrum 
of its Kirchhoff Laplacian matrix Q = D — A [Eq. ([T])]. 
The spectral methods in this section solve the graph 
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partitioning problem by optimizing various subgraph 
connectivity properties. 

The cut size of a subgraph — the number of edges 
necessary to remove to separate the subgraph from 
the graph — is quantified by the quadratic form s^Qs, 
where s = (±1, . . . , ±1)^ is a itl-vector who entries 
are determined by subgraph membership. |42] Minimiz- 
ing this quadratic form over s, whose solution is an 
eigenvalue problem for the graph Laplacian, provides 
a network detection algorithm based on the model of 
minimal cut size. However, there is a paradox in the 
application of spectral methods to network detection: the 
smallest eigenvalue of the graph Laplacian Ao(Q) = 
corresponds to the eigenvector 1 constant over all ver- 
tices, which fails to discriminate between subgraphs. 
Intuitively this degenerate constant solution makes sense 
because the two subgraphs with minimal (zero) subgraph 
cut size are the entire graph itself (s = 1), or the null 
graph (s = —1). This property manifests itself in many 
well-known results from complex analysis, such as the 
maximum principle. 

Fiedler showed that if rather the eigenvector corre- 
sponding to the second smallest eigenvalue Ai(Q) of Q 
is used (many authors write Ai = and A2 rather than 
the zero offset indexing Aq = and Ai used here), then 
for every nonpositive constant c < 0, the subgraph whose 
vertices are defined by the threshold ^1 > c is necessarily 
connected. This algorithm is called spectral detection. 
Given a graph G, the number Ai(Q) is called the 
Fiedler value of G, and the corresponding eigenvector 
^i(Q) is called the Fiedler vector. Completely analogous 
with comparison theorems in Riemannian geometry that 
relate topological properties of manifolds to algebraic 
properties of the Laplacian, many graph topological 
properties are tied to its Laplacian. For example, the 
graph's diameter D and the minimum degree dmin 
provide lower and upper bounds for the Fiedler value 
Ai(Q): A/{nD) < Ai(Q) < n/{n-l)-d^^. |36| 
This inequality explains why the Fiedler value is also 
called the algebraic connectivity: the greater the Fiedler 
value, the smaller the graph diameter, implying greater 
graph connectivity. If the normalized Laplacian L of 
Eq. ([2]) is used, the corresponding inequality involving 
the generalized eigenvalue Ai(L) = Ai(Q,D) involves 
the graph's diameter D and volume V: 1/{DV) < 
Ai(L) <n/(n-l). p| 



Because in practice spectral detection with its im- 
plicit assumption of minimizing the cut size often- 
times does not detect intuitively appealing subgraphs, 
Newman introduced the alternate criterion of subgraph 



mize the subgraph connectivity relative to background 
graph connectivity, which yields the quadratic maximiza- 
tion problem maxg s^Ms, where M = A — F^^dd^ 
is Newman's modularity matrix, A is the adjacency 
matrix, (d)^ = di is the degree vector, and V = l^d 



is the graph volume. |38| Newman's modularity-based 



graph partitioning algorithm, also called community de- 
tection, involves thresholding the values of the principal 



"modularity" for subgraph detection. |38| Rather than 
minimize the cut size, Newman proposes to maxi- 



eigenvector of M. Miller et al. 1 33 1-| 35 1 also consider 
thresholding arbitrary eigenvectors of the modularity 
matrix, which by the Courant minimax principle biases 
the Newman community detection algorithm to smaller 
subgraphs, a desirable property for many applications. 
They also outline an approach for exploiting observations 
within the spectral framework. |^33J 

B. Neyman-Pe arson Subgraph Detection 

Network detection of a subgraph within a graph 
G = {V, E) of order n is treated as n independent binary 
hypothesis tests to decide which of the graph's n vertices 
does not belong (null hypothesis Hq) or belongs (hy- 
pothesis Hi) to the network. Maximizing the probability 
of detection (PD) for a fixed probability of false alarm 
(PFA) yields the Neyman-Pearson test involving the log- 
likelihood ratio of the competing hypothesis. We will 
derive this test in the context of network detection, which 
both illustrates the assumptions that ensure detection 
optimality, as well as indicates practical methods for 
computing the log-likelihood ratio test and achieving 
an optimal network detection algorithm. It will be seen 
that a few basic assumptions yield an optimum test 
involving the graph Laplacian, which allows comparison 
of Neyman-Pearson testing to several other network 
detection methods whose algorithms are also related to 
the properties of the Laplacian. 

Assume that each vertex v ^ V has an unknown 
{0, l}-valued property which is considered to be 
"threat" or "non-threat" at v, and that there exists an 
observation vector z: {vi-^, . . . ,Vi^} C V ^ M C M.^ 
from k vertices to a measurement space M. For example, 
a direct observation of threat at vertex v may be repre- 
sented by the observation z{v) = 1. It is assumed that 
the observation z{v) at v and the threat 0„ at v are not 
independent, i.e. f(^z{v)\@y'j 7^ so that there is 

positive mutual information between z{v) and @y. The 
probability density f(^z{v)\@v) is called the observation 
model, which in this paper is treated as a simple {0, 1}- 
valued model 6[z{v) — 0^). Though the threat network 
hypotheses are being treated here independently at each 
vertex, this framework allows for more sophisticated 
global models that include hypotheses over two or more 
vertices. 
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An optimum hypothesis test is now derived for the 
presence of a network given a set of observations z. Op- 
timaUty is defined in the Neyman-Pearson sense in which 
the probability of detection is maximized at a constant 
false alarm rate (CFAR). As usual, | [5T| the derivation 
of the optimum test involves the procedure of Lagrange 
multipliers. For the general problem of network detection 
of a subgraph within graph G of order n, the decision 
of which of the 2" hypothesis = (6^,, . . . , G^J^ to 
choose involves a 2"-ary multiple hypothesis test over 
the measurement space of the observation vector z, and 
an optimal test involves partitioning the measurement 
space into 2" regions yielding a maximum PD. This NP- 
hard general combinatoric problem is clearly computa- 
tionally and analytically intractable; however, the general 
2"^-ary multiple hypothesis test may be greatly simplified 
by treating it as n independent binary hypothesis tests. 

At each vertex v £ G and unknown threat Q: V ^ 
{0,1} across the graph , consider the binary hypothesis 
test for the unknown value 



The second property yields the likelihood ratio (LR) test. 



Ho(v): e, 
Hi(v): 




1 



(vertex belongs to background) 
(vertex belongs to subgraph). 



(6) 



Given the observation vector z: {vi^, . . . ,^4} C V — 
M CM.^ with observation models f(^z{vi.)\©y^,), j = 1, 
. . . , k, the PD and PFA are given by the integrals 



PD= / /(z|e^ = i)dz, 

Jr 

PFA = / /(z|e^ = 0)dz, 
Jr 



(7) 
(8) 



where R C M is the detection region in which ob- 
servations are declared to yield the decision 0^, = 1, 
otherwise 0^, is declared to equal 0. The optimum 
Neyman-Pearson test uses the detection region R that 
maximizes PD at a fixed CFAR value PFAq. Posing 
this optimization problem over R with the method of 
Lagrange multipliers applied to the function 

F{R,X) = PB{R) - \{PFA{R) - PFAq), 

f{z\e^ = l)dz-X\ [ /(z|0^ = 0)fiz-PFAo 
R ^Jr 

= [ [/(z|0^ = l)-A/(z|0„ = O)](iz + APFAo 
Jr 

(9) 

yields two conditions to maximize F{R, A) over R 
and A: 

(i) A > 0, 

(ii) z G <^ /(z|0^ = 1) - A/(z|0^ = 0) > 0. 



/(z|0. 
/(z|0. 



0) Ho(i>) 



(10) 



that maximizes the probability of detection. As will be 
shown in the next section, the numerator /(z|0t, = 1) 
of Eq. ( fTO] ) is easily computed using standard Bayesian 
analysis, leading to a "threat propagation" algorithm for 
/(0^|z) and a connection to the Laplacian L(G) de- 
scribed in Section [nj and the denominator f{z\@y = 0) 
is determined by prior background information or simply 
the "principle of insufficient reason" [2Sj in which this 
term is a constant. 

Because the probability of detecting threat is maxi- 
mized at each vertex, the probability of detection for the 
entire subgraph is also maximized, yielding an optimum 
Neyman-Pearson test under the simplification of treating 
the 2'^-ary multiple hypothesis testing problem as a 
sequence of n binary hypothesis tests. Summarizing, the 
probability of network detection given an observation z 
is maximized by computing /(0„|z) using a Bayes- 
ian "threat propagation" method and applying a simple 
likelihood ratio test. The connectivity of the subgraph 
whose vertices exceed the threshold is assured by the 
maximum principle. Algorithms for computing /(0^,|z) 
are described next. 

C. Space-Time Threat Propagation 

Many important network detection applications, espe- 
cially networks based on vehicle tracks and computer 
communication networks, involve directed graphs in 
which the edges have departure and arrival times associ- 
ated with their initial and terminal vertices. Space-Time 
threat propagation is used compute the time-varying 
threat across a graph given one or more observations at 
specific vertices and times. ||40|, | [47| In such scenarios, 
the time-stamped graph G = {V,E) may be viewed as 
a space-time graph Gt = {V xT, Et) where T is the 
set of sample times and Et C x T]^ is an edge 
set determined by the temporal correlations between 
vertices at specific times. This edge set is application- 
dependent, but must satisfy the two constraints, (1) if 
{u{tk),v{ti)) G Et then {u,v) € E, and (2) temporal 
subgraphs (^{u,v), Et{u,v)) between any two vertices 
u and V are defined by a temporal model Et{u,v) C 
[T]JT]^. A concrete example for a specific dynamic 
model of threat propagation is provided below. 

1 ) Temporal Threat Propagation: Given an observed 
threat at a particular vertex and time, we wish to compute 
the inferred threat across all vertices and all times. This 
computation is a straightforward application of Bayesian 
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analysis that results in the optimum Neyman-Pearson 
network detection test developed above as well as an 
efficient algorithm for computing this test. Given a 
vertex v, denote the threat at v and at time t € M by 
the { 0, 1 }-valued stochastic process @v{t), with value 
zero indicating no threat, and value unity indicating a 
threat. Denote the probability of threat at f at t by 



i)=p{e,{t)). (11) 



The threat state at v is modeled by a finite-state contin- 
uous time Markov jump process between from state 1 to 
state with Poisson rate A^. With this simple model the 
threat stochastic process G„ (t) satisfies the Ito stochastic 
differential equation, 



-Q^dN^; 640) = 01, 



(12) 



where N^{t) is a Poisson process with rate A^, defined 
for positive time, and simple time -reversal provides the 
model for negative times. Given an observed threat 
z = Q^{0) = 1 at w at t = so that -dyiO) = 1, the 
probability of threat at v under the Poisson process 
model (including time-reversal) is 



7?„(t) = P(G,(t)|z = e,(0) = l) =e 



-A„|i| 



(13) 



This stochastic model provides a Bayesian framework 
for inferring, or propagating, threat at a vertex over time 
given threat at a specific time. The function 



-x.\t\ 



(14) 



of Eq. (13 1 is called the space-time threat kernel and 
when combined with spatial propagation provides a 
temporal model Et for a space-time graph. A Bayesian 
model for propagating threat from vertex to vertex will 
provide a full space-time threat propagation model and 
allow for the application of the optimum maximum 



likeUhood test of Eq. ( [T0| ). 

2 ) Spatial Threat Propagation: Propagation of threat 
from vertex to vertex is determined by tracks or con- 
nections between vertices. A straightforward Bayesian 
analysis yields nonlinear equations that determine the 
probability of threat at each vertex, and along with the 
assumptions of asymptotic independence and small prob- 
abilities these equations may be linearized and thereby 
easily analyzed and solved in regimes relevant to our 
applications. 

The threat at vertex v at which a single track r from 
vertex u arrives and/or departs at times t"^ and is 



determined by Eq. (13 1 and the (independent) event 
f n that threat traveled along this track: P(Gt,(t)) = 



transformation 



u). There is a linear 



/oo 
P{v ^ u)K{t-t^^)5{a -t'^)M'^)d(J (15) 
'OO 

from the threat probability at u to v. Discretizing time, 
the temporal matrix for the discretized operator has 
the sparse form 



. . . K{tk -OO ... 



(16) 



where represents an all-zero column, represents a 
vector of discretized time, and the discretized function 
K{tk — O appears in the column corresponding to the 
discretized time at t". Threat propagating from vertex v 
to u along the same track r is given by the comparable 
expression i^uit) = 'dv{t%)K{t — t^), whose discretized 
linear operator takes the form 



... QK{tk-OQ ... 



(17) 



[cf. Eq. (16l] where the nonzero column corresponds 
to i^. The sparsity of K™ and K^" will be essential for 
practical space-time threat propagation algorithms. 

It will now be shown how threats arriving on other 
tracks from other vertices may be sequentially linearized. 
If the threat on the track r from vertex u must be 
combined with an existing threat ^v{t) at v, then the 
combined threat -dvif^) = P{&v{t^)) at v at time 
immediately after/before the track from u arrives/departs 
at time t is determined by the addition law of probability. 



p{e^{t)ueu{t){v ^u)) 

= p{e,{t)) + p{euit)iv ^ u)) 

-P{Q,ityQu{t)iv^u)). 



(18) 



Under the two assumptions that (1) the threat events 0„ 
and Gt, at u and v are independent, asymptotically valid 
for large time differences relative to the Poisson time A~.^ 
for an observation at vertex v*, |47| and (2) the threat 
probabilities P(Gu(t)) and P(G^(i)) are numerically 
small, Eq. ([TSj) yields the linear approximation 



p{e,{t)) +p{eu{t){v ^u)) 

= Mt) + Mt)P{v^u). (19) 



Extending this analysis to multiple tracks and assuming 
that P{v ^ u)~^ cx w{v) for some weight function 
w: V — )■ M of the vertices, e.g., the degree of each 
vertex, yields the threat propagation equation 



i9 = D-^Ai? 



(20) 



g 
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where i9 is the (discretized) space-time vector of threat 
probabilities, the weighted space-time adjacency matrix 







uv 

Tl 







(21) 



is defined 

diag(t(;(fi)I, 



by Eq. ([16i, and 
. . ,w{vn)i)- Eq. (20l, written as Li9 



_ 0, 
connects the asymmetric Laplacian matrix of Eq. Q 
with threat propagation, the solution of which itself 
may be viewed as a boundary value problem with the 
harmonic operator L. 

Given a cue at vertices Vb^, . . . , vi,^ , the harmonic 
space-time threat propagation equation is 



LiiL 



ib 







(22) 



where the space-time Laplacian 
the space-time threat vector -d 



and 
have been 



permuted so that cued vertices are in the 'b' blocks 
(the "boundary"), non-cued vertices are in 'i' blocks (the 
"interior"), and the cued space-time vector i9b is given. 
The harmonic threat is the solution to Eq. p2]). 



(23) 



The space-time Laplacian of Eq. Q is a directed Lapla- 
cian matrix, and that Eq. ( [22] ) is directly analogous to 
Laplace's equation Ac/? = given a fixed boundary con- 
dition. As discussed in the next subsection, the connec- 
tion between space-time threat propagation and harmonic 
graph analysis also provides a link to spectral-based 
methods for network detection. The nonnegativity of the 



harmonic threat of Eq. ( 23 1 is guaranteed because the 
space-time adjacency matrix A and cued threat vector i9b 
are both nonnegative. This highly sparse linear system 
may be solved by the biconjugate gradient method, 
which provides a practical computational approach that 
scales well to graphs with thousands of vertices and thou- 
sands of time samples, resulting in space-time graphs 
of order ten million or more. In practice, significantly 
smaller subgraphs are encountered in applications such 
as threat network discovery |f46l, for which linear solvers 
with sparse systems are extremely fast. 

Finally, a simple application of Bayes' theorem to 
the harmonic threat -dy = /(0^|z) provides the optimum 



Ney man-Pearson detector [Eq. (10)] developed in Sec- 
tion IIII-BI because 



/(z|e. 
/(z|e. 



i) 

0) 



jz) /(e. = 
o|z) ■ /(e. = 
/(e. = 1) 



1) 



/(G. = 0|z) /(G. = 0) Ho(.) 



A, (24) 



results in a threshold of the harmonic space-time threat 
propagation vector 



Hi 

i9 ^ threshold, 

Ho 



(25) 



possibly weighted by a nonuniform null distribu- 
tion f{@v = 0|z), with the normalizing constant 
f{Qv = 1)//(G„ = 0) being absorbed into the detection 
threshold. This establishes, under the assumptions and 
approximations enumerated above, the detection opti- 
mality of harmonic space-time threat propagation. 

D. Insights from Spectral Graph Theory 

Each network detection algorithm above can be com- 
pared to each other by different approaches taken to 
address the problem posed by the (physical) fact that 
the smallest eigenvalue of the graph Laplacian is zero: 
Ql = 0-1. Fiedler's spectral detection, which minimizes 
the network cut size, thresholds the eigenvector cor- 
responding to the second smallest eigenvalue of the 
Laplacian — the Fiedler value. In contrast, community 
detection, which maximizes the subgraph connectivity 
relative to the background, recasts the objective of spec- 
tral detection resulting in a threshold of the principal 
or other eigenvectors of Newman's modularity matrix 
M = A — V~^dd^ . Alternatively, threat propagation, 
which maximizes the Bayesian probability of detection 
by computing the harmonic solution to Laplace's equa- 
tion, Li9 = 0, but treats this as a boundary value problem 
with observations representing the boundary values and 
unknown values representing the interior. 

E. Computational Complexity 

Depending upon sparsity, the computational complex- 
ity of spectral methods ranges from 0{n\ogn)-0{-n?) 
for principal eigenvector methods [38] to O(n^logn)- 
O(n^) for methods that rely on full eigensolvers f^^- 
p5) , with the lower cost exhibited with graphs whose 
average degree is over logn, below which a random 
Erdos-Renyi graph is almost surely disconnected. [1^ 
The cost of harmonic methods is about O(nlogn)- 
0{v?) for sparse matrix inversion and also depends 
upon the graph's sparsity. In practice, Arnoldi iteration 
can be used for sparse eigenvalue computation and the 
biconjugate gradient method can be used for sparse 
matrix inversion. 

IV. Network Detection Performance 

There are two ways to demonstrate network detection 
performance: empirical and theoretical, both of which 
depend on detailed knowledge of network behavior and 
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Fig. 2. Bayesian generative model for the network simulation with N nodes, K communities, and L "lifestyles" (distributions of community 
participation). Shaded squares are model parameters for tuning and circles are variables drawn during simulation. 



dynamics. But full knowledge of real-world covert net- 
work behavior including relationships to the background 
network is, by design, extraordinarily rare or nonexistent, 
though partial information about many covert networks 
has been integrated over time [57 1. Predicting perfor- 
mance of network detection methods requires details 
of the interconnectivity of both the foreground and 
background networks. Empirical detection performance 
is demonstrated using either a real-world or simulated 
dataset for which the truth is at least partially known, and 
theoretical performance predictions are derived based 
upon statistical assumptions about the foreground and 
background networks. To date, closed-form analytic per- 
formance predictions have been accomplished for very 
simple network models, i.e. cliques pO| , | [3T| , pT} 
or dense subgraphs [4] embedded within Erdos-Renyi 
backgrounds, and there are no theoretical results at all 
for space-time graphs or realistic models appropriate 
for covert networks. Therefore, realistic models are es- 
sential for performance analysis of network detection 
algorithms. There are two basic approaches to modeling 
networks: stochastic models, which attempt to capture 
the aggregate statistical properties of networks, and 
agent-based models, which attempt to describe specific 
behaviors. In general, stochastic models have greater 
tractability because they do not rely on the detailed 
description of actions or objectives of a specific network. 

The empirical detection performance of the covert 
network detection algorithms described above will be 
computed using a Monte-Carlo analysis based upon a 
new stochastic blockmodel. Empirical performance pre- 
dictions may be also based on a single dataset, oftentimes 
a practical necessity for real-world measurements. Detec- 
tion performance for specific, real-world single datasets 
is illustrated in an accompanying paper. | [59| 



A. Covert Network Stochastic Blockmodel 

To adhere with observed phenomenology of real- 
world networks, realistic network models should exhibit 
properties including connectedness, a power-law degree 
distribution (the "small world" property), membership- 
based community structure, sparsity, and temporal co- 
ordination. No one simple network model captures all 
these traits, e.g. Erdos-Renyi graphs can be almost 
surely connected, though do not exhibit a power-law 
density, power-law models such as R-MAT |9| do not ex- 
hibit a membership-based network structure, and mixed- 
membership stochastic blockmodels [2 J do not include 
temporal coordination. To achieve a realistic network 
model possessing this range of properties, we propose a 
new statistical-based model with parameterized control 
over the generation of interactions between network 
nodes. The proposed model is depicted in Fig. [2] using 
plate notation. 

1) Spatial Stochastic Blockmodel: The proposed 
model may be viewed as an aggregation of the several 
simpler models of which it is comprised: Erdos-Renyi 
(dominant at low degrees), fl^ Chung-Lu (dominant at 
high degrees), [1] and a mixed-membership blockmodel 
that models community interactions. Q The overall 
network model is approximated by each of the simpler 
models in the regime where the simple model dominates. 
The Erdos-Renyi model defines the overall sparsity and 
connectivity. The Chung-Lu model creates a power-law 
degree distribution empirically consistent with a broad 
range of real-world networks. The stochastic blockmodel 
creates distinct communities each with their own param- 
eterized interaction models. 

The space-time graph of the proposed mixed- 
membership stochastic blockmodel is determined by a 
connectivity model and temporal model. Let N be the 
total number of nodes, and K be the number of com- 
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Fig. 3. Adjacency matrix of a stochastic blockmodel with a 

foreground subgraph whose intra-activity is 50% more than of all pig. 4. Graph of the adjacency matrix shown in Figure |3] The 
other subgraphs. foreground graph and intra subgraph edges are shown in red. 



munities. Each node divides its time among at least one 
of the several K communities, and the number of ways 
in which a node distributes its time among the different 
communities is discretized into L distinct "lifestyles." 
Each node is assigned to a specific lifestyle. For example, 
nodes 1 and 3 may spend all their time in community 1, 
thereby sharing the same lifestyle, whereas node 2 may 
spend half its time in community 1 and half in commu- 
nity 2, and therefore occupies another lifestyle, and so 
forth. The rate of interactions between nodes i = 1 
and j is given by the product 

A., = • • zr_,,Bz,^„ (26) 

where the first term if- represents the (modified) Erdos- 
Renyi model, the second term / (^k^k) represents 
the Chung-Lu model, and the third term zJ_^j'Bzi^j 
represents the stochastic blockmodel. 

At each node-to-node interaction, a random draw from 
a multinomial distribution determines the community to 
which each node belongs. The indicator function Ip- is a 
sparse i^-by-i^ (0, l)-matrix whose entries are binomial 
random variables with probability {S)ab for node i in 
community a and node j in community b. An Erdos- 
Renyi sparsity model has (S)^;, = p for all communities 
a and b, whereas this modified sparsity model allows 
the possible of differing interaction rates within and 
across communities. The Chung-Lu term XiXj/ A^) 
is determined by the per-node expected degrees \i,i = l, 
. .., N, which are themselves drawn from a power-law 
distribution of parameter ex G M^. The blockmodel term 
zJ^jBzi^j is determined by B, a K-hy-K matrix of the 
rate of interaction between communities, and Zi^j G 



is an indicator (01) -vector is the community to which 
node i belongs when interacting with node j. This 
community is the same over the entire simulation and is 
drawn from a multinomial over tTj G M^, node i's distri- 
bution over communities. Finally, the distribution of tt is 
drawn from a Dirichlet r.v. with concentration parameter 
IJX. Node i's lifestyle, Ij is a multinomial draw with the 
lifestyle probability (p G M^. The adjacency matrix and 
graph of this model are illustrated in Figs. |3] and |4] using 
an example with a mixed community with a higher level 
of activity for the foreground network. 

2) Temporal Stochastic Blockmodel: The meeting 
times for each interaction are chosen independently of 
the spatial model. Real-world interactions are often coor- 
dinated, with many individuals arriving or leaving from 
a location at a set of pre-defined times. This behavior is 
parameterized by an average number of meeting times 
* G for each community. The simulated number 
of meeting times is a Poisson r.v. (offset by 1) with 
Poisson parameter ^' — 1. E.g. an expected number of 
meeting times {^)k = 1 (for community k) yields a 
constant Poisson r.v. of 1 meeting time (in Matlab, 
poissrnd(O) = 0), thereby yielding a community 
whose activities are tightly coordinated because there 
is only a single time for the members to meet. An 
expected number of meeting times {'^)k = 20 yields 
a community whose activities are loosely coordinated 
because meetings may occur at any one of a number 
of times. The meetings times themselves are chosen 
uniformly over time, and each node arrives at the meet- 
ing time perturbed by a zero-mean Gaussian r.v. with a 
parameterized variance. 
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50 100 

PFA (%) 

Fig. 5. Receiver operating characteristics versus forground co- 
ordination (^foreground = 1-5, high Coordination, and 20, low 
coordination) for space-time threat propagation (STTP) and spectral- 
based community detection (SPEC). The community activity level 
Sk = 1-logA'^k/Afk for all communities. [1000 Monte Carlo trials.] 

B. Network Detection Results 

The detection performance of the network detection 
algorithms described above is presented in this section 
using empirical Monte Carlo results applied to the 
mixed-membership stochastic blockmodel. A space-time 
graph is chosen independently for each Monte Carlo 
trial. A set of baseline parameters is chosen to achieve 
realistic foreground and background networks of specific 
sizes, and excursions are performed on the parameters 
controlling foreground coordination and foreground ac- 
tivity. The performance metric is the standard receiver 
operating characteristic (ROC), which in the case of net- 
work detection is the probability of detection (measured 
as the percentage of true foreground nodes detected) 
versus the number or percentage of false alarms (the 
number of background nodes detected) as the detection 
threshold is varied. Perfect ROC performance is a 100% 
detection rate with a 0% false alarm rate, and the worst 
possible performance is a detection rate equal to chance, 
i.e. equal to the false alarm rate. 

1) Baseline Model: A baseline model is used com- 
prised of eleven lifestyles spanning ten communities. 
Two of the lifestyles are designated as foreground 
lifestyles and all others are "background." As detail 
above, each lifestyle has a propensity toward a different 
mix of community activity. The background lifestyles 
have a power-law distribution of membership over the 
background communities, which may be imagined to 
represent business, homes, industry, religion, sports, 
or other social interactions. Two distinct foreground 
lifestyles are used to model the compartmentalization 




PFA (%) 

Fig. 6. Receiver operating characteristics versus foreground activity 
(Sfg = 2- log Affg/A^fg, high activity, and 1- log A^fg/A^fg, baseline 
activity) for space-time threat propagation (STTP) and spectral-based 
community detection (SPEC). The foreground coordination level is 
specified by = 20 average number of meeting times. [1000 
Monte Carlo trials.] 



of real-world covert networks. One foreground lifestyle 
associates uniformly across background communities, 
whereas the other foreground lifestyle has a strong 
association with a only small subset of background com- 
munities. These foreground lifestyles may be imagined 
to represent specialized functions or activities within the 
covert network. As in real life, the foreground lifestyles 
comprise only a tiny fraction of the entire population. 

Interactions in which two nodes belong to the same 
community occur at a higher rate than interactions 
of nodes belonging to different communities. This is 
modeled by specifying that the block matrix B be di- 
agonally dominant, perhaps strongly. Furthermore, real- 
world communities are not disconnected, thus for a 
community size of N^, the diagonals of the Erdos-Renyi 
sparsity parameter matrix Sjt must be at least log N^/Nk 
to ensure that each community is almost surelycon- 
nected |[T6|. Finally, covert networks necessarily have 
sparse — not clique-like — structure, thus the Erdos-Renyi 
sparsity parameter S for the covert network must also be 
low. 

2 ) Detection versus Foreground Coordination and Ac- 
tivity: Two nominal values are chosen for foreground 
coordination and foreground activity, then both space- 
time threat propagation and spectral-based community 
detection algorithms are applied using a randomized cue 
over 1000 Monte Carlo trials. Fig. [5] shows the detec- 
tion performance of both algorithms as the foreground 
coordination changes from a high of = 1.5 average 
number of meeting times to a low of 'J'fg = 20. As 
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predicted, the detection performance of space-time threat 
propagation improves as the temporal coordination of 
the foreground network increases. The optimality of 
this Bayesian network detector is predicated on tem- 
poral coordination, and decreased coordination makes 
the foreground network more difficult to detect. This 
example uses a constant baseline level of community 
activity (sparsity matrix Sfg = 1- log A'^fg/A^fg), thus 
the optimality assumption of high foreground activity 
made by spectral-based community detection algorithm 
is violated, and as expected this spectral algorithm does 
no better than chance for either coordination level. 
Fig. |6] shows the detection performance of both algo- 
rithms as the foreground activity changes from Sfg = 
1- log N{g/N{g (baseline activity) to Sfg = 2- log N{g/Nfg 
(high activity). The foreground coordination level is low, 
at 'J'fg = 20, providing an example for which none of 
the basic algorithmic assumptions hold for either space- 
time threat propagation or spectral-based community 
detection. The low foreground activity results, Sfg = 
1- log N{g/N{g, are replicated in this figure from Fig. [S] 
in which STTP yields moderate detection performance 
and spectral-based community detection is no better than 
chance. At high foreground activity the foreground net- 
work is detectable at by both spectral-based community 
detection and space-time threat propagation. 

V. Conclusions 

The problem of covert network detection is analyzed 
from the perspectives of graph partitioning and alge- 
braic graph theory. Network detection is addressed as a 
special case of graph partitioning in which membership 
in a small subgraph of interest must be determined, 
and a common framework is developed to analyze and 
compare different network detection methods. A new 
Bayesian network detection framework called space- 
time threat propagation is introduced that partitions the 
graph based on prior information and direct observations. 
Space-time threat propagation is shown to be optimum 
in the Neyman-Pearson sense subject to the assumption 
that threat networks are connected by edges temporally 
correlated to a cue or observation. Bayesian space-time 
threat propagation is interpreted as the solution to a 
harmonic boundary value problem on the graph, in which 
a linear approximation to Bayes' rule determines deter- 
mines the unknown probability of threat on the uncued 
nodes (the "interior") based on threat observations at cue 
nodes (the "boundary"). This new method is compared 
to well-known spectral methods by examining competing 
notions of network detection optimality. Finally, a new 
generative mixed-membership stochastic blockmodel is 
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introduced for performance prediction network detec- 
tion algorithms. The parameterized model combines key 
real- world aspects of several random graph models: 
Erdos-Renyi for sparsity and connectivity, Chung-Lu for 
power-law degree distributions, and a mixed-membership 
stochastic blockmodel for distinctive community-based 
interaction and dynamics. This model is used to compute 
empirical detection performance results for the detection 
algorithms described in the paper as both foreground 
coordination and activity levels are varied. Though the 
results in the paper are empirical, it is our hope that both 
the paper's analytic results and performance modeling 
will be useful in future closed-form analysis of real- 
world covert network detection problems. 
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